Customer Lifetime Value Prediction

6 month LTV

Histogram clearly shows we have customers with negative LTV. We have some outliers too. Filtering out the outliers makes sense to have a proper machine learning model.

We will merge our 3 months and 6 months dataframes to see correlations between LTV and the feature set we have.

We can see positive relation here, higher RFM score means higher LTV

Modeling Approach

Considering business part of this analysis, we need to treat customers differently based on their predicted LTV. For this example, we will apply clustering and have 3 segments (number of segments really depends on your business dynamics and goals):

  1. Low LTV
  2. Mid LTV
  3. High LTV

Feature Modeling

Getting Dummy variables

Calculating Correlation

Sperate independent variables and target variable (LTVCluster) into X and y respectively and split data into train and test.

Model Building

Precision and recall are acceptable for 0. As an example, for cluster 0 (Low LTV), if model tells us this customer belongs to cluster 0, 90 out of 100 will be correct (precision). And the model successfully identifies 93% of actual cluster 0 customers (recall). We really need to improve the model for other clusters. For example, we barely detect 42% of Mid LTV customers. Possible actions to improve those points: